Neural Dual Extended Kalman Filtering: Applications in Speech Enhancement and Monaural Blind Signal Separation

نویسندگان

  • Eric A. Wan
  • Alex T. Nelson
چکیده

The removal of noise from speech signals has applications ranging from speech enhancement for cellular communications, to front ends for speech recognition systems. A nonlinear time-domain method called dual extended Kalman filtering (DEKF) is presented for removing nonstationary and colored noise from speech. We further generalize the algorithm to perform the blind separation of two speech signals from a single recording. INTRODUCTION Traditional approaches to noise removal in speech involve spectral techniques, which frequently result in audible distortion of the signal. Recent time-domain nonlinear filtering methods utilize data sets where the clean speech is available as a target signal to train a neural network. Such methods are often effective within the training set, but tend to generalize poorly for actual sources with varying signal and noise levels. Furthermore, the network models in these methods do not fully take into account the nonstationary nature of speech. In the approach presented here, we assume the availability of only the noisy signal. Effectively, a sequence of neural networks is trained on the specific noisy speech signal of interest, resulting in a nonstationary model which can be used to remove noise from the given signal. A noisy speech signal y(k) can be accurately modeled as a nonlinear autoregression with both process and additive observation noise: x(k) = f(x(k 1); :::x(k M );w) + v(k) (1) y(k) = x(k) + n(k); (2) where x(k) corresponds to the true underlying speech signal driven by process noise v(k), and f( ) is a nonlinear function of past values of x(k) parameterized by w. The speech is only assumed to be stationary over short segments, with each segment having a different model. The available observation is y(k), which contains additive noisen(k). The optimal estimator given the noisy observationsy(k) = fy(k); y(k 1); y(0)g is E[x(k)jy(k)]. The most direct way to estimate this would be to train on a set of clean data in which the true x(k) may be used as the target to a neural network. Our assumption, however, is that the clean speech is never available; the goal is to estimate x(k) itself from the noisy measurements y(k) alone. In order to solve this problem, we assume that f( ; ) is in the class of feedforward neural network models, and compute the dual estimation of both states x̂ and weights ŵ based on a Kalman filtering approach. In this paper we provide a basic description of the algorithm, followed by a discussion of experimental results. DUAL EXTENDED KALMAN FILTERING By posing the dual estimation problem in a state-space framework, we can use Kalman filtering methods to perform the estimation in an efficient, recursive manner. At each time point, the Kalman filter provides an optimal estimation by combining a prior prediction with a new observation. Connor et al.[4], proposed using an extended Kalman filter with a neural network to perform state estimation alone. Puskorious and Feldkamp [13] and others have posed the weight estimation in a state-space framework to allow for efficient Kalman training of a neural network. In prior work, we extended these ideas to include the dual Kalman estimation of both states and weights for efficient maximum-likelihood optimization (in the context of robust nonlinear prediction, estimation, and smoothing) [15]. The work presented here develops these ideas in the context of speech processing. A state-space formulation of (1) and (2) is as follows: x(k) = F [x(k 1)] + Bv(k); (3) y(k) = Cx(k) + n(k); (4) where x(k) = 26664 x(k) x(k 1) ...x(k M + 1) 37775 ; F [x(k)] = 26664 f(x(k); : : : ; x(k M + 1);w) x(k) ...x(k M + 2) 37775 C = 1 0 0 ; B = CT : (5) If the model is linear, then f(x(k)) takes the formwTx(k), and F [x(k)] can be written as Ax(k), where A is in controllable canonical form. We initially assume the noise terms v(k) and n(k) are white with known variances 2 v and 2 n, respectively. Methods for estimating the noise variances directly from the noisy data are described later in this paper. Extended Kalman Filter State Estimation For a linear model with known parameters, the Kalman filter (KF) algorithm can be readily used to estimate the states [9]. At each time step, the filter computes the linear least squares estimate x̂(k) and prediction x̂ (k), as well as their error covariances, PX(k) and P X (k). In the linear case with Gaussian statistics, the estimates are the minimum mean square estimates. With no prior information on x, they reduce to the maximum likelihood estimates. When the model is nonlinear, the KF cannot be applied directly, but requires a linearization of the nonlinear model at the each time step. The resulting algorithm is called the extended Kalman filter (EKF), and effectively approximates the nonlinear function with a time-varying linear one. The EKF algorithm is as follows: x̂ (k) = F [x̂(k 1); ŵ(k 1)] (6) P ̂ x (k) = APx̂(k 1)AT + B 2 vBT ; where A = @F [x̂; ŵ] @x̂ x̂(k 1) (7) K(k) = P ̂ x (k)CT (CP ̂ x (k)CT + 2 n) 1 (8) Px̂(k) = (I K(k)C)P ̂ x (k) (9) x̂(k) = x̂ (k) +K(k)(y(k) Cx̂ (k)): (10) Dual Extended Kalman Filter Weight Estimation Because the model for the speech is not known, the standard EKF algorithm cannot be applied directly. We approach this problem by constructing a separate state-space formulation for the underlying weights as follows: w(k) = w(k 1) (11) y(k) = f(x(k 1);w(k)) + v(k) + n(k); (12) where the state transition is simply an identitymatrix, and the neural network f(x(k 1);w(k)) plays the role of a time-varying nonlinear observation on w. These statespace equations for the weights allow us to estimate them with a second EKF. ŵ (k) = ŵ(k 1) (13) P ̂ w (k) = Pŵ(k 1) (14) Kŵ(k) = P ̂ w (k)H(k)T (H(k)P ̂ w (k)H(k)T + 2 n + 2 v) 1 (15) Pŵ(k) = (I Kŵ(k)H(k))P ̂ w (k) (16) ŵ(k) = ŵ (k) +Kŵ(k)(y(k) CF (x̂(k 1); ŵ (k))) ; (17) where H(k) = C@F [x̂; ŵ] @ŵ ŵ(k 1) : (18) The linearization in (18) can be computed as a dynamic derivative [16] to account for the recurrent nature of the state-estimation filter, including the dependence of the Kalman gain K(k) on the weights. The calculation of these derivatives is computationally expensive, and can be avoided by ignoring the dependence of x̂(k) on ŵ.1 This approximation was used to produce the results in this paper. The use of the full derivatives is currently being investigated by the authors. 1This is equivalent to a single-step of backpropagation through time [16]. x[k-1] Measurement Update EKF1 Measurement Update EKF2 x[k] ˆ ˆ

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Neural Speech Enhancement Using Dual Extended Kalman Filtering

The removal of noise from speech signals has applications ranging from speech enhancement for cellular communications, to front ends for speech recognition systems. Spectral techniques are commonly used in these applications, but frequently result in audible distortion of the signal. A nonlinear time-domain method called dual extended Kalman filtering (DEKF) is presented that demonstrates signi...

متن کامل

Removal of noise from speech using the dual EKF algorithm

Noise reduction for speech signals has applications ranging from speech enhancement for cellular communications, to front ends for speech recognition systems. A neural network based time-domain method called Dual Extended Kalman Filtering (Dual EKF) is presented for removing nonstationary and colored noise from speech. This paperdescribes the algorithm and provides a set of experimental results.

متن کامل

Optimization and Parallelization of Monaural Source Separation Algorithms in the openBliSSART Toolkit

We describe the implementation of monaural audio source separation algorithms in our toolkit openBliSSART (Blind Source Separation for Audio Recognition Tasks). To our knowledge, it provides the first freely available C++ implementation of non-negative matrix factorization (NMF) supporting the Compute Unified Device Architecture (CUDA) for fast parallel processing on graphics processing units (...

متن کامل

Blind Source Separation Based on Dual Adaptive Control

ABSTRACT This paper presents a new method for Blind Source Separation (BSS) based on dual adaptive control, which allows successful separation of linear mixtures of independent source signals. The method reformulates a BSS problem to get a dual adaptive control problem. Then a Sigmoid MLP neural network is used to approximate the widesence-mixing matrix defined in the BSS problem. By solving th...

متن کامل

Blind separation and deconvolution for convolutive mixture of speech using SIMO-model-based ICA and multichannel inverse filtering

We propose a new two-stage blind separation and deconvolution (BSD) algorithm for a convolutive mixture of speech, in which a new Single-Input Multiple-Output (SIMO)-modelbased ICA (SIMO-ICA) and blind multichannel inverse filtering are combined. SIMO-ICA can separate the mixed signals, not into monaural source signals but into SIMO-model-based signals from independent sources as they are at th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997